140 research outputs found

    Circular Languages Generated by Complete Splicing Systems and Pure Unitary Languages

    Full text link
    Circular splicing systems are a formal model of a generative mechanism of circular words, inspired by a recombinant behaviour of circular DNA. Some unanswered questions are related to the computational power of such systems, and finding a characterization of the class of circular languages generated by circular splicing systems is still an open problem. In this paper we solve this problem for complete systems, which are special finite circular splicing systems. We show that a circular language L is generated by a complete system if and only if the set Lin(L) of all words corresponding to L is a pure unitary language generated by a set closed under the conjugacy relation. The class of pure unitary languages was introduced by A. Ehrenfeucht, D. Haussler, G. Rozenberg in 1983, as a subclass of the class of context-free languages, together with a characterization of regular pure unitary languages by means of a decidable property. As a direct consequence, we characterize (regular) circular languages generated by complete systems. We can also decide whether the language generated by a complete system is regular. Finally, we point out that complete systems have the same computational power as finite simple systems, an easy type of circular splicing system defined in the literature from the very beginning, when only one rule is allowed. From our results on complete systems, it follows that finite simple systems generate a class of context-free languages containing non-regular languages, showing the incorrectness of a longstanding result on simple systems

    PIntron: a Fast Method for Gene Structure Prediction via Maximal Pairings of a Pattern and a Text

    Full text link
    Current computational methods for exon-intron structure prediction from a cluster of transcript (EST, mRNA) data do not exhibit the time and space efficiency necessary to process large clusters of over than 20,000 ESTs and genes longer than 1Mb. Guaranteeing both accuracy and efficiency seems to be a computational goal quite far to be achieved, since accuracy is strictly related to exploiting the inherent redundancy of information present in a large cluster. We propose a fast method for the problem that combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are highly confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings that are sequences obtained from paths of a graph structure, called Embedding Graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the size of P, T and of the output. PIntron, the software tool implementing our methodology, is able to process in a few seconds some critical genes that are not manageable by other gene structure prediction tools. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when compared with ENCODE data. Detailed experimental data, additional results and PIntron software are available at http://www.algolab.eu/PIntron

    Migracije in pravna negotovost med pandemijo: Kvalitativna študija italijanskega primera

    Get PDF
    The COVID-19 pandemic has unequally impacted the lives of Italian subjects. The article uses evidence from forty-seven semi-structured interviews with various migrant groups to illuminate how temporalities embedded in Italy’s migration governance shape migrants’ precarious legal status and access to welfare. The authors show that whereas migrants with secure legal status or citizenship have not engaged significantly with Italian bureaucracies, they have no easy access to welfare as it is contingent on their employment and financial status. Migrants with precarious status have been the worst hit by the pandemic’s secondary effects across several fronts. These findings have implications for policy and future research.Pandemija Covida-19 je neenakopravno posegla v življenja prebivalcev Italije. Članek temelji na podatkih iz 47 polstrukturiranih intervjujev z različnimi skupinami migrantov. Ti kažejo, kako začasne rešitve, vgrajene v italijanski sistem upravljanja migracij, vplivajo tako na negotovi pravni status migrantov kot na njihov dostop do socialnega varstva. Čeprav migranti z urejenim pravnim statusom ali državljanstvom nimajo veliko opravka z italijansko birokracijo, kljub temu nimajo lahkega dostopa do socialne blaginje, ki je odvisna od njihovega delovnega in finančnega statusa. Migrante prekarce so najbolj prizadeli sekundarni učinki pandemije. Ugotovitve avtorjev so pomembne tako za politiko upravljanja migracij kot za prihodnje raziskave

    The complexity of multiple sequence alignment with SP-score that is a metric

    Get PDF
    AbstractThis paper analyzes the computational complexity of computing the optimal alignment of a set of sequences under the sum of all pairs (SP) score scheme. We solve an open question by showing that the problem is NP-complete in the very restricted case in which the sequences are over a binary alphabet and the score is a metric. This result establishes the intractability of multiple sequence alignment under a score function of mathematical interest, which has indeed received much attention in biological sequence comparison

    Pure Parsimony Xor Haplotyping

    Full text link
    The haplotype resolution from xor-genotype data has been recently formulated as a new model for genetic studies. The xor-genotype data is a cheaply obtainable type of data distinguishing heterozygous from homozygous sites without identifying the homozygous alleles. In this paper we propose a formulation based on a well-known model used in haplotype inference: pure parsimony. We exhibit exact solutions of the problem by providing polynomial time algorithms for some restricted cases and a fixed-parameter algorithm for the general case. These results are based on some interesting combinatorial properties of a graph representation of the solutions. Furthermore, we show that the problem has a polynomial time k-approximation, where k is the maximum number of xor-genotypes containing a given SNP. Finally, we propose a heuristic and produce an experimental analysis showing that it scales to real-world large instances taken from the HapMap project
    corecore